-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep walproposer alive until shutdown checkpoint is safe on safekepeers #6712
Conversation
2502 tests run: 2380 passed, 0 failed, 122 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
ba204b0 at 2024-03-11T19:08:26.723Z :recycle: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't check the changes made in the various PG versions due to what seems to be GitHub UI error, but looks OK.
Looks like PG change push is missing in the repo. |
So I can't yet check out the core changes, but I wonder why postmaster treated walproposer as bgw prevously. Looking at SignalSomeChildren/CountChildren it checks only bkend_type, which is set to BACKEND_TYPE_WALSND soon after MarkPostmasterChildWalSender... |
Created branches for those and pushed, should be visible now.
The postgres changes show that, but in a nutshell, because walproposer is marked as BACKEND_TYPE_BGWORKER, postmaster never calls IsPostmasterChildWalSender() on it. Postmaster expects that a BACKEND_TYPE_NORMAL process can morph into a walsender, but not a bgworker. |
11b473a
to
ddfd4ef
Compare
There several more things to do:
I'll try to finish it. |
Interesting, locally it already works without any fixes. Checked, this is because after checkpoint is flushed postmaster kills walproposer with SIGUSR2, which sets the latch, and walproposer manages to commit (and send commit_lsn) to safekeeper before this. |
In #6751 I'm reverting the previous change that appears to have made test_pg_regress less stable -- would probably make sense to revert the revert in this PR, along with the walproposer changes. Hopefully together they make the test stable. |
In case this is useful: a failure on a branch where I had an explicit wait_for_flush_lsn before compute stop and logged the before + after LSNs, where it looks like the safekeeper has seen the post-stop LSN, but the pageserver hasn't #6752 (comment) |
PR #6712 might add a delay in graceful shutdown without change to compute -> sk protocol to force sync of control file on safekeeper to persist commit_lsn, so use 'immediate' mode instead; 'fast' one isn't useful for neon anyway.
ddfd4ef
to
3b09bd1
Compare
Mostly did them, now need to merge PG PRs and then this one. @hlinnaka please have a look, I can't request your review :) |
a554700
to
98315c8
Compare
98315c8
to
6f2ae2b
Compare
6f2ae2b
to
09cc433
Compare
09cc433
to
d22a092
Compare
d22a092
to
72bf53b
Compare
af08861
to
47022a6
Compare
The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited. As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example. Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core. After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always. In commit 568f914, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again. Some tests wrongly assumed that no WAL could have been written between pg_current_wal_flush_lsn and quick pg stop after it. Fix them by introducing flush_ep_to_pageserver which first stops the endpoint and then waits till all committed WAL reaches the pageserver. In passing extract safekeeper http client to its own module.
This test occasionally fails with a difference in "pg_xact/0000" file between the local and restored datadirs. My hypothesis is that something changed in the database between the last explicit checkpoint and the shutdown. I suspect autovacuum, it could certainly create transactions. To fix, be more precise about the point in time that we compare. Shut down the endpoint first, then read the last LSN (i.e. the shutdown checkpoint's LSN), from the local disk with pg_controldata. And use exactly that LSN in the basebackup. Closes #559
47022a6
to
ba204b0
Compare
The walproposer pretends to be a walsender in many ways. It has a WalSnd slot, it claims to be a walsender by calling MarkPostmasterChildWalSender() etc. But one different to real walsenders was that the postmaster still treated it as a bgworker rather than a walsender. The difference is that at shutdown, walsenders are not killed until the very end, after the checkpointer process has written the shutdown checkpoint and exited.
As a result, the walproposer always got killed before the shutdown checkpoint was written, so the shutdown checkpoint never made it to safekeepers. That's fine in principle, we don't require a clean shutdown after all. But it also feels a bit silly not to stream the shutdown checkpoint. It could be useful for initializing hot standby mode in a read replica, for example.
Change postmaster to treat background workers that have called MarkPostmasterChildWalSender() as walsenders. That unfortunately requires another small change in postgres core.
After doing that, walproposers stay alive longer. However, it also means that the checkpointer will wait for the walproposer to switch to WALSNDSTATE_STOPPING state, when the checkpointer sends the PROCSIG_WALSND_INIT_STOPPING signal. We don't have the machinery in walproposer to receive and handle that signal reliably. Instead, we mark walproposer as being in WALSNDSTATE_STOPPING always.
In commit 568f914, I assumed that shutdown will wait for all the remaining WAL to be streamed to safekeepers, but before this commit that was not true, and the test became flaky. This should make it stable again.
Closes: #559